智能论文笔记

Unsupervised Object Localization: Observing the Background to Discover Objects

Oriane Siméoni , Chloé Sekkat , Gilles Puy , Antonin Vobecky , Éloi Zablocki , Patrick Pérez

分类：计算机视觉

2022-12-15

Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depend on the tasks and datasets of evaluation. In this work, we take a different approach and propose to look for the background instead. This way, the salient objects emerge as a by-product without any strong assumption on what an object should be. We propose FOUND, a simple model made of a single $conv1\times1$ initialized with coarse background masks extracted from self-supervised patch-based representations. After fast training and refining these seed masks, the model reaches state-of-the-art results on unsupervised saliency detection and object discovery benchmarks. Moreover, we show that our approach yields good results in the unsupervised semantic segmentation retrieval task. The code to reproduce our results is available at https://github.com/valeoai/FOUND.

translated by 谷歌翻译

接受注释较弱的对象探测器是全面监督者的负担得起的替代方案。但是，它们之间仍然存在显着的性能差距。我们建议通过微调预先训练的弱监督检测器来缩小这一差距，并使用``Box-In-box''（bib'（bib）自动从训练集中自动选择了一些完全注销的样品，这是一种新颖的活跃学习专门针对弱势监督探测器的据可查的失败模式而设计的策略。 VOC07和可可基准的实验表明，围嘴表现优于其他活跃的学习技术，并显着改善了基本的弱监督探测器的性能，而每个类别仅几个完全宣布的图像。围嘴达到了完全监督的快速RCNN的97％，在VOC07上仅10％的全已通量图像。在可可（COCO）上，平均每类使用10张全面通量的图像，或同等的训练集的1％，还减少了弱监督检测器和完全监督的快速RCN之间的性能差距（In AP）以上超过70％，在性能和数据效率之间表现出良好的权衡。我们的代码可在https://github.com/huyvvo/bib上公开获取。

translated by 谷歌翻译